Look for patterns and relationships between features
Determine what features seperate gender best
Column
Classification
Used Scikit-learn in Python
Split the data for training/testing (1/3, 2/3)
Used gridsearch to identify the best parameters
KNN (K-Nearest Neighbors)
Decision Tree (DT)
Suport Vector Machine (SVM)
Observed prediction outcomes. Could do better.
Attempt to Improve on initial results
KNN: Transform data with PCA
Decision Tree: Use multiple trees with Random Forest
SVM: Transform data with PCA
Column
Review
Confusion Matrix
Overall Accuracy Scores
Male Accuracy
Female Accuracy
Parameter Influence
Graph Results
Overview
Column
Description
Dataset Comments
Database created to identify a voice as male or female, based upon acoustic properties of the voice and speech.
The dataset consists of 3,168 recorded voice samples, collected from male and female speakers.
The voice samples are pre-processed by acoustic analysis in R using the seewave and tuneR packages, with an analyzed frequency range of 0hz-280hz (human vocal range).
The samples are represented by 21 different features